Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP
نویسندگان
چکیده
This paper describes a statistical spectral parameter emphasis technique for HMM-based speech synthesis using mel-scaled line spectral pair (mel-LSP). Spectral parameter emphasis is effective for compensating over-smoothed spectra in HMM-based speech synthesis. However, there is no conventional technique that satisfies such requirements as automatic tuning for different speakers and realtime synthesis for mel-LSP. In the proposed method, the cumulative distribution function (CDF) is calculated from the histogram of spectral parameters that are extracted from training speech data. In the same manner, CDF of spectral parameters that are generated from HMMs is constructed. Then an emphasis rule is trained so that the CDF of generated parameters equals to that of training data. After generating a spectral parameter sequence from HMMs, the spectral parameter sequence is emphasized by using the rule. Experimental results show that our proposed method improves speech quality.
منابع مشابه
Histogram Equalization Based Front-end Processing for Noisy Speech Recognition
In this paper, we present Gabor features extraction based on front-end processing using histogram equalization for noisy speech recognition. The proposed features named as Histogram Equalization of Gabor Bark Spectrum features, HeqGBS features are extracted using 2-D Gabor processing followed by a histogram equalization step from spectro-temporal representation of Bark spectrum of speech signal...
متن کاملAdvances in Spectral Parameterization for Statistical (HMM-Based) TTS
HMM-based parametric speech synthesis has recently become an alternative to the concatenative TTS approach, especially when low footprint and general speech domain are required. A majority of speech parameterization models used in state-ofthe art HMM TTS systems employ source-filter waveform synthesis schemes. Sinusoidal representation and waveform generation of speech is an alternative to the ...
متن کاملSinusoidal model parameterization for HMM-based TTS system
A sinusoidal representation of speech is an alternative to the source-filter model. It is widely used in speech coding and unit-selection TTS, but is less common in statistical TTS frameworks. In this work we utilize Regularized Cepstral Coefficients (RCC) estimated in mel-frequency scale for amplitude spectrum envelope modeling within an HMM-based TTS platform. Improved subjective quality for ...
متن کاملSinusoidal model parameterization for HMM-based TTS system-Interspeech2010_v2.1.1
A sinusoidal representation of speech is an alternative to the source-filter model. It is widely used in speech coding and unit-selection TTS, but is less common in statistical TTS frameworks. In this work we utilize Regularized Cepstral Coefficients (RCC) estimated in mel-frequency scale for amplitude spectrum envelope modeling within an HMM-based TTS platform. Improved subjective quality for ...
متن کاملCompensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition
A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating for an acoustic mismatch between training and test environments but also reducing the two fundamental limitations of the conventional histogram equalization method, the discrepancy between the phonetic distributions of training and test speech data, and th...
متن کامل